-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support Apache Arrow as a normalized data representation #2115
Conversation
…ften not the case)
@@ -70,7 +92,7 @@ export function percentile(reduce) { | |||
|
|||
// If the values are specified as a typed array, no coercion is required. | |||
export function coerceNumbers(values) { | |||
return values instanceof TypedArray ? values : map(values, coerceNumber, Float64Array); | |||
return isNumberArray(values) ? values : map(values, coerceNumber, Float64Array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This fixes coerceNumbers
for BigInt arrays which is important because Apache Arrow uses BigInt64Array for dates… somewhat unnecessarily in my opinion when Float64Array would work better in practice.)
I think this works! I will close #2030 because the description and discussion here is much better. |
If we convert on the fly all the eligible
Nothing to see there. More interestingly, we see crashes with:
… and that is all! So I guess these are two places that need an additional arrayify. [FIXED] code for the global conversionin options.js, replace dataify by this version: import {tableFromJSON} from "apache-arrow";
// Like arrayify, but also allows data to be an Apache Arrow Table.
export function dataify(data) {
return isArrowTable(data)
? data
: Array.isArray((data = arrayify(data))) && data.length > 0 && isObject(data[0]) && !data[0].type
? tableFromJSON(data)
: data;
} |
We'll have to review all the remaining occurrences of
|
It would probably be a good to have some kind of compatibility mode for arrow that returns dates instead of numbers. Jeff suggested this before. |
Great job testing and finding these remaining incompatibilities, @Fil! Thank you. |
I reviewed the other places you identified and optimized the handling of Arrow tables. PTAL! |
I've added a (convoluted) test for the code in tree.js, and a fix. I've also added tests for some of the other places we identified. Note that the comparator sort was not tested at all (with arrays nor arrow) until now! |
Looks good to me! Please approve if you agree. |
This allows an Apache Arrow Table to be used directly as the normalized representation of mark data (computed in
mark.initialize
), rather than requiring conversion to an array. This requires patching a few places to support both types, and then finally invalueof
we materialize the columns as arrays usingtable.getChild
andvector.toArray
, which should be as efficient as possible.It’s a bit tricky without type checking to make sure we haven’t missed things, but it doesn’t seem too bad, and we should be able to find the places with tests.
This also uncovered a number of preexisting problems with BigInt coercion (since Arrow represents dates as BigInt).
Fixes #191.
Fixes observablehq/framework#1376.
Supersedes #2030.
Supersedes #2096.